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INTRODUCTION 


Optical  computing  (that  is  the  use  of  optical  and  electro-optical  devices 
to  perform  mathematical  computations  such  as  matrix  multiplication,  solution 
of  simultaneous  linear  equations,  etc.)  is  a  subject  of  current  interest.  One 
of  the  main  reasons  being  the  possible  use  of  such  device  technology  to  large 
array  processing,  hopefully  in  the  parallel  processing  mode.  The  computing 
speeds  attainable  using  optical  components  are  the  major  factor  in  this  quest. 

There  are  basically  two  aspects  of  the  problem.  The  first  is  the  physics 
and  technology  of  the  devices  that  perform  the  manipulations  via  optical  and/or 
electro-optical  means.  A  considerable  effort  has  been  expanded  upon  device 
development;  it  is  safe  to  say  that  this  program  has  now  begun  to  bear  fuition 

in  that  several  devices  have  shown  capabilities  that  warrant  optimism. 

/ 

This  brings  us— id  the  second  aspect  of  the  problem,  the  subject  of  the 
current  research  ef forty-name ly  the  development  of  a  theory  of  optical  com¬ 
puting,  Fnr  ftYarr1"!  it  is  generally  agreed  that  optical  computing  has  an 
advantage  over  digital  computing  in  situations  where  parallelism  can  be 
exploited.  The  canonical  examples  are  matrix-vector  multiplication  and 
matrix-matrix  multiplication.-  Generally,  investigators  in  optical  computing, 
have  taken  algorithms  directly  from  the  standard  numerical  analysis  literature 
and  modified  it  for  use  in  optical  computing.  The  most  successful  example  is 
matrix-matrix  multiplication  based  on  outer-product  decomposition  as 
popularized  by  Athale  and  associates.  If  the  matrices  are  both  square  and  of 
size  n*n,  then  outer-product  decomposition  achieves  a  saving  in  computational 


2 

time  be<  .se  the  n  inner  products  can  be  evaluated  concurrently.  However, 
previous  to  the  present  investigation  no  one  seems  to  have  developed  ab  initio 
numerical  algorithms  specifically  fcv  use  in  optical  computing  by  taking 
advantage  of  the  fact  that  convolutions  can  be  performed  very  rapidly.  We 
have  developed  such  an  algorithm  for  matrix-matrix  multiplications.  It  is 
outlined  in  Section  1.  j  Our  second  completed  contribution  is  the  development 
of  a  tractable  mathematical  model  of  an  optical  system  (assuming  incoherent 
light  operations)  and  its  use  into  an  investigation  of  the  inherent  limits  of 
confutation  of . such  a  system  in  terms  of  a  lower  bound  on  the  simultaneous 
resources  of  volume  and  computing  time,  t  This  material  is  outlined  in  Section 
/  2.  Note  that  the  material  in  these  two  sections  will  be  submitted  for 

I 

l  publication  in  the  near  future.  Work  is  still  continuing  on  the  influence  of 

S  ^  device  uncertainty  on  parallel  processing  via  optical  confuting. 
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AN  ALGORITHM  FOR  MATRIX-MATRIX 
MULTIPLICATION  VIA  CONVOLUTION 
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One  of  the  virtues  of  electro-optical  computing  is  the  ability  to  carry 
out  convolution  operations  very  rapidly.  Given  this  technical  advantage,  it 
is  worthwhile  to  develop  an  algorithm  for  the  multiplication  of  two  rectangular 
matrices  using  convolution. 

A  A/N  A 

To  this  end  let  us  consider  the  matrix  product  C-  AB  where  A  is  of 

A  A 

size  n1Xn2»  B  is  of  size  n2*n3'  and  C  is  of  size  niXn3‘  Let  the 

corresponding,  matrix  elements  be  a.  .,  b.,  ,  and  c.,  .  Associate  with  A  and 

i)  jk  .  ik 

A 

B  the  polynomials  P(x)  and  Q(x),  with  x  being  interpreted  as  an  inde¬ 


terminate 


(n^l)  n2n3+n2~l 


P(x)  = 


n2n3-l 


Q  lx)  = 


E  v 


Note  that  the  degree  of  P(x)  involves  not  only  the  size  of  A  through  n^ 
and  n2  but  also  the  size  of  B  through  n^.  The  degree  of  Q(x),  on  the 
other  hand,  involves  only  the  size  of  B,  namely  n^  and  n^.  The  p  and 

/A  A 

q  coefficients  are  related  to  the  matrix  elements  of  A  and  B  by 


P  =  a 
s  n 


0 


if  s  =  (i-1) n2n^  +  j  -  1 


if  (i-1) n2n  + n  ^ s < in  n 


with:  l<i<nj.,  1  <  j  <  n2  and  l<.k<n3* 

a 

We  claim  that  the  elements  of  the  matrix  product  C  are  given  by 
selected  coefficients  of  the  polynomial 


R(x)  =  P(x)Q(x) 


nin2n3"l 


z 

m=0 


m 


r  x 
m 


(5) 


where 


r 

m 


(6) 


is  the  discrete  convolution  of  the  p  and  q  coefficients.  These  selected 

r  are  given  by 
m 

r  =  c  ,  if  m  =  (i-l)n~n_.  +  kn-  -  1  .  (7) 

m  ik  232 


A  formal  proof  (which  is  really  a  verification  of  the  formulae)  is  now 
given.  We  begin  by  rewriting  Eq.  (6)  in  the  form 


r 

m 


z 

a, ,6 


a  b 

i]  J* 


(8) 


where  the  summation  in  the  second  series  is  over: 


■SwyS 


s  =  (i-l)n2n3  +  j  -  1 


(i-l)n2n3  <  s  <  (i-l)n2n3  +  n2 


t  =  m  -  5  =  kn2  -  j 


t  <  n2n3 


The  a  term  is  simply  Eq.  (3a) ,  while  the  B  term  is  the  negation  of  Eq. 


(3b).  The  y  term  follows  from  Eq.  (4a),  while  the  6  term  is  the  negation 


of  Eq.  (4b).  Upon  substitution  of  the  a  terra  into  the  B  inequality,  we 


immediately  see  that  this  can  only  be  true 


1  <  j  ^  n„ 


In  like  fashion,  substitution  of  the  y  term  into  the  6  inequality  leads 


to  the  requirement  that 


m  =  (i-l)n2n3  +  kn2  -  1 


which  is  Eq.  (7).  Thus  the  formulae  are  verified. 


A  construction  which  leads  to  the  various  formulae  for  p  and  q  in 

s  t 


terms  of  a. .  and  b._  respectively  uses  row  vectors.  Consider  a  row 
i]  ]k 


vector  p  whose  elements  we  denote  by  pg  (coefficients  of  the  polynomial 


P (x) )  composed  of  the  matrix  elements  a_  of  A  and  strings  of  zeros 


as  depicted  in  Fig.  1A.  The  range  of  s  is 


0  <  s  ^  n^n^n^  -  n2n3  +  n2_  1 


see 


consequently 


ps  =  Q  , 


if  s  >  (n1~l)n2n3  +  n2 


if  s  <  in2n3 


(13a) 


(13b) 


Furthermore  the  p  are  related  to  the  a. .  as  given  by  Eq.  (3a),  as  the 
s  13 

reader  can  verify  by  construction. 

In  like- fashion,  we  construct  another  new  row  vector  q  with  elements 
qfc  according  to  Fig.  IB.  Unlike  p,  q  has  no  strings  of  zero  elements. 
The  range  of  t  is 

0  <  t  <  n2n3  “  1  (14] 


so  that 


qfc  =  0,  if  t  >  n2n3 


Within  the  range  of  t,  the  q  are  related  to  the  b  by 

1 K 


qt  ■  v  ■ 


if  t  =  (k-1) n2 +  n2  -  j 


which  reduces  to  Eq.  (4a) . 

As  an  illustrative  example  of  the  algorithm,  consider  the  case  where  A 

is  2*2,  B  is  2x3  so  that  C  is  2x3  (i.e.,  n^  =  2 ,  n2  =  2 ,  n^  =  3)  . 

The  upper  limits  on  the  polynomials  P,  Q  and  R  are  7,  5,  and  11, 

respectively.  The  p  ,  q  and  r  coefficients  evaluated  according  to 

s  t  m 

Eqs.  (3),  (4)  and  (7)  are  li^ti- 1  m  Table  1.  Upon  carrying  out  the  convolu¬ 

tion  operation,  Eq.  (6),  in  c,;.  ■  met  ion  with  this  table  we  have: 


‘  a  f «. ^  —  *-***  V*  t"  N.  „  ‘/v.  *.  .  .  V.  \ 


/ 


These  are,  of  course,  the  matrix  elements  as  obtained  by  more  standard 


procedures. 

The  implementation  of  the  algorithm  can  be  carried  out  in  a  straight¬ 
forward  fashion  by  re-examination  of  Figs.  lA  and  IB.  Note  that. the  row 
vector  p  in  Fig.  1A  consists  of  the  rows  of  A  in  which  zeros  are  inter¬ 


spaced,  *the  number  of  zeros  is  fixed.  Thus  we  can  easily  handle  the  vector 


£>  containing  the  matrix  elements  a^ .  The  row  vector  q,  containing  the 

matrix  elements  b  ,  is  simply  the  columns  of  B  in  reverse  order,  see 
J  K 

Fig.  IB.  This  vector  is  also  easily  handled  in  the  implementation. 


SECTION  TWO 

LOWER  BOUNDS  ON  THE  COMPUTATIONAL 
EFFICIENCY  OF  OPTICAL  COMPUTING  SYSTEMS 


The  advent  of  Very  Large  Scale  Integrated  (VLSI)  circuitry  has  lead  to 
considerable  decrease  in  the  physical  size  of  computers  with  a  corresponding 
increase  in  speed  of  execution  of  operations.  Basically  there  are  three 
interrelated  aspects  to  VISI:  design  and  fabrication  of  the  chips,  design  of 
systems  which  use  these  chips  for  specific  applications,  and  development  of 
algorithms  which  utilize  the  inherent  capabilities  of  such  chips.  The  re¬ 
volution  in  computer  science,  for  both  numerical  and  nonnumerical  applications, 
brought  about  by  VISI  continues  unabated. 

The  computational  limitations  of  VLSI  were  first  investigated  by 
Thompson  II] .  For  an  introduction  to  this  work  see  the  basic  text  of 

Ullman  [2]  which  contains  references  to  subsequent  work.  It  has  been  shown 

2 

that  any  VLSI  circuit  with  area  A  and  time  T  requires  at  least  AT  =  ft(n) 

to  solve  various  computational  problems  such  as  FFT,  convolution,  and  l x  £ 

2 

matrix  multiplication  where  n =  l  .  The  symbol  Q  is  defined  in  Ullman: 
f(n)  =Q(g(n))  means  that  there  exists  a  positive  constant  c  such  that  for 
an  infinite  number  of  values  of  n  we  have  f (n)  ^ eg (n) . 

Nevertheless,  VISI  suffers  from  the  limitation  that  the  technology  upon 
which  it  relies  is  inherently  two-dimensional.  Snyder's  recent  review  [3] 
contains  a  very  useful  discussion  of  the  constraints  imposed  by  VLSI  as  regards 
planarity.  In  particular  conventional  VLSI  chips  are  constructed  by  super¬ 
posing  a  small  number  of  layers  on  top  of  a  substrate.  This  substrate  has  a 
thickness  which  is  order  of  xaum  tude  greater  than  the  size  of  the  transistors 
and  wire  width.  Input  and  ouq  ut  from  a  conventional  VLSI  chip  must  be  made 


by  a  limited  number  of  pads  located  on  the  sides  of  the  chip.  VLSI  chip  tech¬ 


nology  is  changing  almost  daily;  however,  some  of  the  more  basic  aspects  are 


discussed  in  Barbe  [4]  and  Einspruch  [5].  Although  an  ensemble  of  two- 


dimensional  chips  can  be  placed  on  top  of  each  other  with  holes  drilled  down 


through  them  for  interchip  communication,  the  total  number  of  layers  is 


seriously  limited  by  the  substrate  thickness  of  each  chip:  consequently  the 


resulting  device  cannot  properly  by  termed  "three-dimensional  VLSI".  For  this 


reason,. it  appears  that  truly,  three-dimensional  VI£I  will  most  likely  not  be 


possible  to  fabricate.  Nevertheless  some  interesting  theoretical  investiga¬ 


tions  of  three-dimensional  VLSI  have  been  carried  out:  Rosenberg"  [6], 


Leighton  and  Rosenberg  [7]. 


The  purpose  of  the  present  communication  is  to  summarize  investigations 


into  various  aspects  of  the  computational  performance  of  three-dimensional 


devices  which  make  hybrid  use  of  electronic  and  optical  components  to  perform 


operations.  Our  goal  is  to  facilitate  general  statements  on  such  electro- 


optical  computations  with  specific  reference  to  lower  bounds  on  their  com¬ 


plexity.  Since  such  devices  may  contain  a  large  number  of  components,  we 


term  them  VLSIO,  with  the  0  denoting  optics. 


We  note  that  a  very  useful  overview  of  optical  computing  (more  properly 


electro-optical  computing)  may  be  found  in  Caulfield  et  at.  [8]. 


In  order  to  carry  out  such  an  analysis  we  outline  the  development  of  an 


abstract  model  of  VLSIO  which  is  essentially  technology  independent  but 


incorporates  the  physical  restrictions  of  light  beam  propagation  as  expounded 


by  Gabor  [9],  especially  with  respect  to  the  very  important  fact  that  the 


Wi  s'l*- 


ri 


mom 


amount  of  information  passing  through  a  cube  of  small  volume  is  bounded.  This 
physical  constraint  allows  us  to  adapt  previous  VLSI  lower  bound  arguments  to 
the  VLSIO  situation  and  allows  for  comparisons  of  electro-optical  computing 
devices  in  terms  of  their  volume  V  and  the  time  T  taken  by  VLSIO  on  a 
given  input  '■(=  number  of  time  units  that  elapse  from  the  first  input  signal 
until  the  last  output  signal) .  We  avoid  making  assumptions  about  the  precise 
physics  of  the  devices  utilized.  This  would  only  limit  the  later  application 
of  these  ideas  as  the  physical  models  are  improved  and  modified.  Optical 
physics  (through  Gabor's  theorem)  implies  an  upper  limit  on  the  rate  of  in¬ 
formation  transfer  across  an  optical  beam,  and  hence  a  lower  bound  on  com¬ 
putational  efficiency  of  VLSIO.  In  addition  we  assume  that  any  2-D  convolu¬ 
tion  of  an  n  *  n  array  of  points  can  be  achieved  by  a  VLSIO  device  in  unit 
time  step.  This  assumption  is  reasonable  because  there  already  exist  optical 
devices  which  perform  thusly. 

Note  that  all  the  variables  and  functions  are  taken  to  be  Boolean  (i.e. , 
the  values  of  the  variables  are  taken  from  { 0 , 1 } ) . 

We  begin  by  discussing  the  well  known  abstract  two-dimensional  model  of 
a  VLSI  chip  as  a  *  L2  *  L^  grid  graph  with  height  L^  (<<  L^  or  h^)  held 
constant.  The  distance  between  grid  points  is  w,  the  feature  width.  The 
chip  processors  are  located  at  various  distance  nodes  of  the  grid  graph  with 
each  processor  storing  a  state  consisting  of  b  bits.  Furthermore  the 
processors  execute  synchronously  on  a  step  consisting  of  a  time  unit  of 
duration  T  seconds.  The  remaining  nodes  are  used  for  wire  routing,  or  for 
input  and  output  pods.  Each  wire  can  run  along  a  path  in  the  grid  graph  from 


an  input  pod,  or  a  processor,  to  various  output  pods,  or  processors.  Wires 
are  not  allowed  to  intersect.  On  each  time  step,  a  value  consisting  of  b 
bits  of  information  is  transmitted  across  the  wire  grid  from  either  an  input 
pod  or  a  processor.  The  state  of  each  processor  is  then  updated  on  each  step 
by  a  fixed  function  of  the  values  transmitted  by  the  wires  leading  into  the 
processor,  and  by  the  state  of  the  processor,  in  the  previous  step.  The  unit 
step  transmission  time  across  wires  is  justified  by  the  fact  that  wire  trans¬ 
mission  can  be  made  generally  faster  than  transistor  switching  times.  This 
remarkably  simple  model  is  sufficient  to  determine  the  computational  efficiency 
of  VLSI  devices. 

Following  the  two-dimensional  version,  the  fundamental  building  block  of 

our  VLSIO  device  is  the  optical  box  B.  It  is  a  parallelopiped  having  lengths 

L.  ,  L_  and  L,  with  input  and  output  faces,  F.  and  F  .  .  These  faces 
12  3  c  r  in  out 

are  assumed  to  take  as  input  and  as  output  two-dimensional  integer  arrays 
I(x,y)  and  0(x,y)  respectively.  For  convenience,  we  consider  the  input 
sources  and  output  detectors  to  be  very  small  compared  to  the  size  of  the 
optical  box  (in  order  to  minimize  optical  diffraction  effects) ,  furthermore 
they  are  uniformly  spaced  a  distance  w  apart.  The  input  sources  are  taken 
to  be  LED's  (laser  emitting  diodes)  and  the  detectors  are  unspecified  except 
to  state  that  they  are  sensitive  only  to  the  intensity  of  the  LED  radiation, 
we  remaind  the  reader  that  most  electro-optical  computations  are  now  performed 
via  incoherent,  geometrical  optics  based  processors  and  not  by  coherent, 

Fourier  transform  based  processors.  The  ancillary  optical  equipment  (lenses, 
prisms,  gratings,  etc.)  which  spread  and  then  collect  the  light  can  be 


neglected  in  this  version  of  the  abstract  model. 

The  output  array  is  computed  on  each  time  step  with  a  duration  x  as  a 
fixed  function  A  of  the  input  array;  A  will,  of  course,  depend  upon  the 
detailed  optical  characteristics  of  B. 

The  optical  box,  in  addition  to  being  three-dimensional,  also  differs 
from  VLSI  in  another  way;  namely,  optical  beams  rather  than  wires  provide 
storage  and  cross- flow.  Since  the  modus  operandi  is  incoherent  radiation, 
these  beams  can  intersect  without  interacting.  The  basic  question  that  now 
arises  is:  "to  what  extent  do  optical  (laser)  beams  behave  as  wires?" 

A  wire  can  only  transport  information  at  a  finite  rate  depending  upon 
wire  cross-section,  skin  effects,  etc.  We  would  also  expect  an  optical  beam 
to  perform  similarly  not  withstanding  the  greater  information  rate.  This 
problem  has  already  been  addressed  by  Gabor  [9]  who  studied  the  "metrical 
information"  in  a  light  beam.  The  conclusion  that  he  draws  is  that  a  light 
beam  always  has  a  finite  upper  limit  with  respect  to  information  rates;  the 
upper  limit  depending  upon  wavelength  of  light,  smallest  effective  beam  area, 
solid  angle  of  divergence,  etc.  We  need  not  concern  ourselves  with  explicit 
formulae;  for  our  purposes  it  suffices  that  we  can  interpret  an  optical  beam 
as  a  wire. 

Given  this  equivalence,  we  turn  to  the  important  problem  of  determining 
lower  bounds  (in  terms  of  simultaneous  volume  and  time)  on  the  computational 
resources  required  for  VLSIO  to  solve  various  problems. 

In  order  not  to  unduly  lengthen  the  text,  it  is  assumed  that  the  reader 
is  familiar  with  Sections  1.4,  2.1  and  2.2  of  Ullman's  basic  text  [2J. 


Consider  a  Boolean  function  f  with  a  set  X  of  n  input  variables 


and  a  set  Y  of  m 

output 

variables. 

Let  X* 

be  a  subset  of 

X;  also  let 

p  -  (XL'  V  V  V 

where 

X  ,  X  and 

Li  R 

V  yr 

are  partitions 

of  X  and 

respectively.  We  term  P  balanced  if  between  one-third  and  two-thirds  of  X' 
lies  in  X  and  note  it  by  P  .  If  a  and  6  are  two  input  assignments, 

la  D 

then  we  term  them  a  fooling  pair  of  assignments  to  X  if: 

1)  output  Y  is  distinct  for  input  assignments  a(x)  and  ct(X  )8(X  ) 

L  L  R 

2)  output  Y  is  distinct  for  input  assignments  8(X)  and  Ot(X  )8(X  ). 

K  L  K 

In  addition,  let  the  fooling  set  for  P  be  a  set  of  assignments  A  of  X 
such  that  for  all  distinct  a,  86  A,  at  least  one  of  (a, 8),  (8, a)  is  a 
fooling  pair. 

Finally,  we  require  that  the  locations  and  times  of  the  input  and  output 
are  given  only  once. 

Crucial  to  t-he  analysis  is  the  concept  of  information  content  (essentially 
"the  amount  of  information  that  must  cross  a  boundary  in  order  to  solve  the 
problem").  Formally  the  information  content  of  the  Boolean  function  f  is; 

max  min  max  log^ ( | A | ) 


where  A  denotes  the  fooling  set  corresponding  to  P^.  The  following 
functions  (of  importance  in  electro-optical  computing)  are  known  to  have 
information  content  I^*fl(n): 

a)  n  point  discrete  Fourier  transforms. 

2 

b)  multiplication  and  inversion  of  two  l  *  l  matrices  where  n =  l  , 


c)  n  point  convolution. 


The  following  important  result  on  lower  bounds  is  due  to  Thompson  [1,2] : 

Any  two-dimensional  VLSI  ohip  computing  a  Boolean  function  f  requires 

2  2 

simultaneous  area  A  and  time  T  satisfying  AT  = 

We  now  prove:  Any  three-dimensional  "optical  box"  computing  a  Boolean 
function  f  requires  simultaneous  volume  v  and  time  T  satisfying 

Vt3/2  =  Q(I3/2). 

The  proof  (which  we  now  sketch)  is  an  adaptation  of  the  two-dimensional 

technique.  Let  the  device  be  a  parallelepiped  having  dimensions  <  L^ 

with  volume  V^L^L^^.  Choose  X'  to  be  the  subset  of  X  such  that 

I  =I,(X').  For  each  i«  1,2,3  we  can  find  a  cut  C.  of  area 
f  f  i 

A,  ^  2V/L.  i  *  1  ,*  (2) 

l  l 

which  disconnects  the  device  into  two  components  each  of  which  contains  at 
most  two- thirds,  but  no  less  than  one-third,  of  the  inputs  of  X'.  By 
definition  at  least  If  bits  must  be  transported  across  each  cut;  this 
requires  time 


Consequently 


2  3 


3 


3/2 


which  is  the  sought-for  result.  The  main  point  to  emphasize  is  that  this 
result  depends  upon  the  fact  that  we  can  treat  light  beams  as  if  they  were 


wires. 

An  immediate  consequence  of  this  theorem  is  that  the  lower  bounds  for 
optical  computing  are: 

a)  n  point  convolution  or  n  point  discrete  Fourier  transforms 


VT 


3/2 


Q(n 


(6) 


2 

b)  multiplication  and  inversion  of  two  l  x  i  matrices  where  n  ml 

VT3/2  =  n(£3)  .  (7) 

These  results  follow  from  the  statements  quoted  after  Eq.  (1) .  Equations  (6) 
and  (7)  represent  the  lower  bound  performance  of  these  two  operations  in 
terms  of  volume  and  time.  It  is  important  to  remember  that  these  bounds  are 
a  consequence  of  the  fact  that  we  allow  the  entire  volume  of  B  to  be 
operative. 
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